An Empirical Model Of Multiword Expression Decomposability
نویسندگان
چکیده
This paper presents a constructioninspecific model of multiword expression decomposability based on latent semantic analysis. We use latent semantic analysis to determine the similarity between a multiword expression and its constituent words, and claim that higher similarities indicate greater decomposability. We test the model over English noun-noun compounds and verb-particles, and evaluate its correlation with similarities and hyponymy values in WordNet. Based on mean hyponymy over partitions of data ranked on similarity, we furnish evidence for the calculated similarities being correlated with the semantic relational content of WordNet.
منابع مشابه
Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
The integration of multiword expressions in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly pre-identified. This paper evaluates two empirical strategies to integrate multiword units in a real constituency parsing context and shows that the results are not as promising as has sometimes been suggested. Firstly, we show th...
متن کاملParsing Models for Identifying Multiword Expressions
Multiword expressions lie at the syntax/semantics interface and have motivated alternative theories of syntax like Construction Grammar. Until now, however, syntactic analysis and multiword expression identification have been modeled separately in natural language processing. We develop two structured prediction models for joint parsing and multiword expression identification. The first is base...
متن کاملMachine Translation of Non-Contiguous Multiword Units
Non-adjacent linguistic phenomena such as non-contiguous multiwords and other phrasal units containing insertions, i.e., words that are not part of the unit, are difficult to process and remain a problem for NLP applications. Non-contiguous multiword units are common across languages and constitute some of the most important challenges to high quality machine translation. This paper presents an...
متن کاملWord Sense Disambiguation Based on Weight Distribution Model with Multiword Expression
This paper proposes a two-phase word sense disambiguation method, which filters only the relevant senses by utilizing the multiword expression and then disambiguates the senses based on Weight Distribution Model. Multiword expression usually constrains the possible senses of a polysemous word in a context. Weight Distribution Model is based on the hypotheses that every word surrounding a polyse...
متن کاملTowards an Empirical Subcategorization of Multiword Expressions
The subcategorization of multiword expressions (MWEs) is still problematic because of the great variability of their phenomenology. This article presents an attempt to categorize Italian nominal MWEs on the basis of their syntactic and semantic behaviour by considering features that can be tested on corpora. Our analysis shows how these features can lead to a differentiation of the expressions ...
متن کامل